Code alienation

If you try and read your own code written 3 months ago, chances are that it will feel as if it was written by a total stranger: weird algorithms, strange variable names, apparently random hard-coded values,"what's going on here?",... This "code alienation" problem is more acute when the code that you are reading is not even yours!

  • The good news: everyone thinks like that.
  • The bad news: you have to work on it. But it's not a great deal of effort.

What is the solution to this problem? Is there a way to make "code alienation" less worrysome or painful? To make code that can be read in the future, or can be read by others, you need to document your code.

What and Why. Simply.

The documentation of your code should give information about

  • what is the code supposed to do
  • why does it do it that way

The first guideline helps mainly other developers to use your code. In python, this mostly consist of the so called docstrings, or text that hold information about the functions or objects to be used.

The second guideline helps people reading your code, or trying to update or improve it, or fixing bugs, by providing them with a rationale. These consist on comments that usually point out the functionality of chunks of code, or explanations about the approach used.

However, these guidelines should be complemented with simplicity: writing too many comments might not affect the performance of your program, but they are unnecessary, and they clutter the space necessary for reading the program. Too many comments can affect the person reading your code (remember, it might be your future you!):

  • the person might stop reading it, out of frustration.
  • they might misunderstand it.

Keeping the documentation as simple as possible (but not simpler!) should be the aim of any programmer. The same is true for the code: simple, clean code can reduce the amount of comments needed, specially when meaningful names and structure arise from that simplification. Sometimes people use comments as a crutch for their poor understanding of the program/problem they are tackling.

Docstrings:

Quoting from PEP 257:

A docstring is a string literal that occurs as the first statement in a module, function, class, or method definition. Such a docstring becomes the doc special attribute of that object.

This string should hold information about what the module, function, class or method DOES: what is its purpose, what are the variables (if any) required, any options, what are the outputs,...

For example, the following are the first lines of a function that implements a complex number from its real and imaginary parts:

def complex(real=0.0, imag=0.0):
    """Form a complex number.

    Keyword arguments:
    real -- the real part (default 0.0)
    imag -- the imaginary part (default 0.0)
    """

    if imag == 0.0 and real == 0.0:
        return complex_zero
    ...


They consist of a piece of text (one or multiple lines) that are delimited by a triple double-quotation marks.

  • One-liners - Just that, a single line between triple quotation marks
  • Multiple lines - They start with a single line (like a one-liner) followed by a blank line, and then the rest of the explanation (as seen in the previous code example).

Please, go and read PEP 257 now!

Comments:

Comments are complete sentences (in English, please!) preceded by the hashtag symbol. They are used to add information that cannot be easily delivered reading the code.

Quoting Jef Raskin:

[code] can’t explain why the program is being written, and the rationale for choosing this or that method. [It] cannot discuss the reasons certain alternative approaches were taken. For example:

:Comment: A binary search turned out to be slower than the Boyer-Moore algorithm for the data sets of interest, thus we have used the more complex, but faster method even though this problem does not at first seem amenable to a string search technique. :End Comment:

This comment not only names the technique used, but also explains why a simpler approach was not taken.

Also, make sure you treat comments as a constitutive part of your code. The comments should talk about what the code is doing and the rationale behind it. If the code changes substantially so as to make the comments obsolete, change your comments accordingly.

Quoting from PEP 8:

Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Bad documentation

In the community there is some debate regarding comments in the code. Specially, how much documentation is necessary, what should the comments say, and so on.

For example, let's take a look at BADLY DOCUMENTED piece of code:


In [1]:
import scipy as sp #This imports scipy
x = sp.linspace(0,2*sp.pi,1000) # This creates an array of 1000 elements from 0 to 10, equally spaced, and stores it in x
for d in sp.linspace(0,sp.pi,10):  # For each small angle from 0 to pi in steps of 0.01, do...
    MxD = sp.sin(x+d) # This creates an array MxD that stores the values of sin(x)
    #Prints the value of the integral of y
    print "The sum is {0:.3f}".format(sum(MxD*(x[1]-x[0])))


The sum is -0.000
The sum is 0.002
The sum is 0.004
The sum is 0.005
The sum is 0.006
The sum is 0.006
The sum is 0.005
The sum is 0.004
The sum is 0.002
The sum is -0.000

You can see that this code has a comment every single line, yet it can be considered badly documented. Why? Well, it fails to deliver crucial information, like what is the code supposed to do? or why does it do it that way?

A better example of the previous code would be


In [2]:
""" Test of the integral of a sine in one period.

This script aims to check that the integral of a sine
function over a period of oscillation is 0, regardless
of the initial dephasing.

The integral is done for functions of the type sin(x+a), 
over x from 0 to 2pi (a period of oscillation), with
variable initial phase a.

Since it is a simple script, we use the Rectangle rule with 
1000 elements between 0 and 2pi.

We expect each output to be close to 0.
"""

import scipy as sp
#The domain of integration is x=(0,2pi). We use 1000 points.
#The test is performed for 10 dephasing angles  angles=(0,pi)
x = sp.linspace(0,2*sp.pi,1000)
dx = x[1]-x[0]
angles = sp.linspace(0,sp.pi,10)

for a in angles: 
    y = sp.sin(x+a) 
    integral = sum(2*y[1:-1]*dx)
    print "The integral is {0:.3f}".format(integral)


The integral is -0.000
The integral is -0.004
The integral is -0.008
The integral is -0.011
The integral is -0.012
The integral is -0.012
The integral is -0.011
The integral is -0.008
The integral is -0.004
The integral is -0.000

Food for thought: